
PCT/AU2004/001088 




Patent Office 
Canberra 



I, LEANNE MYNOTT, MANAGER EXAMINATION SUPPORT AND 
SALES hereby certify that annexed is a true copy of the Provisional specification 
in connection with Application No. 2003904350 for a patent bv 
S IL VERB RO OK RESEARCH PTY LTD. as filed on 1 5 August 2003 




WITNESS my hand this 
Twenty-fifth day of August 2004 




LEANNE MYNOTT 

MANAGER EXAMINATION SUPPORT 
AND SALES 



PRIORITY 
DOCUMENT 

SUBMITTED OR TRANSMITTED IN 
COMPLIANCE WITH RULE 1 7. 1 (a) OR (b) 



SYSTEMS, METHODS AND APPARATUS 



J on Napper and Paul Lapstun 
iop.napper .paul.lapstunl@silverbroolcresearcri.coi 

Silverbrook Research Pty Ltd 
393 Darling Street, Balmain NSW Australia 

30 June 2003 



1 Introduction 



The increase use of pen computing and the emergence of paper-based interfaces to networked 
computing resources [10,11] has highlighted the need for techniques to search raw digital ink. Pen- 
based computing allows users to store data in the form of digital ink notes and annotations, and 
subsequently search this date using hand-written or hand-drawn queries. However, searching raw 
digital mk is more difficult than traditional text searching due to variations and inconsistencies hi the 
production of handwntmg and hand-drawn images, and thus methods for improving search accuracy 
using domain-specific knowledge, constraints, and contextual information are valuable This 
document discusses a number of novel techniques for improving the accuracy of digital ink 



1.1 Cross-References 

Various methods systems and apparatus relating to the present invention are disclosed in the 
following co-pendnig applications tiled by the applicant or assignee of the present invention. The 
disclosures of all of these co-pending applications are incorporated herein by cross-reference. 
5 October 2002: 

Australian Provisional Application 2002952259, "Methods and Apparatus (NPT019)". 
15 October 2002: 

PCT/AU02/01391, PCT/AU02/01392, PCT/AU02701393,-PCT/AU02/01394 andPCT/AU02/01395. 

26 November 2001: 

PCT/AU01/01527, PCT/AU01/01528, PCT/AU01/01529, PCT/AU01/01530 and PCT/AU01/01531. 
11 October 2001: 

■ 

PCT/AU01/01274. 
14 August 2001: 
PCT/AU01/00996. 

27 November 2000: 

PCT/AU00/01442, PCT/AUO0/01444, PCT/AU00/01446, PCT/AU00/01445 PCi7AU0O/nl4<n 
PCT/AU00/01453, PCT/AU00/01448, PCT/AU00/01447 PCT/AU00/0 1459 SS 1 
PCT/AUOO/01454, PCT/AU00/01452, PCT/AUO0/01443 PCT^U00/0 455 PCTmSm56 
PCT/AUOQ/01457, PCT/AUOO/01458 and PCT/AUOO/01449. ' AUUWU1455 ' r CT/AU00/01456, 

20 October 2000: 

PCT/AU00/O1273, PCT/AU00/01279, PCT/AU00/01288, PCT/AUOO/01282 PCT/AU00/0127fi 
PCT/AUOO/01280, PCT/AU00/01274, PCT/AUOO/01289 PCT/AUoS 275 S2 xil 
PCT/AUOO/01286, PCT/AU00/01281, PCT/AUOO/01278 PC17AU0O/0 287 PCT/AUoSo 285 
PCT/AU00/O1284andPCT/AU0O/01283. wauwuub/, fCWAUOQ/01285, 
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15 September 2000: 

PCT/AU00/O1 108, PCT/AUOO/Oi 1 10 and PCT/AUOO/01 1 1 L 
30 June 2000: 

SUSSs S22 S^ 0 " 00761 - ■ > CT'A"0(V007 6 0, PCT/AU00/0O75O, 
24 May 2000: 



1.2 Digital Ink Definition 



contact ih f<«j£lJ^i SKfS^^ "» ^ 



1.3 Digital Ink Searching 



with the text^dSJ^S^sSS^S^^ ^ ^ n to *"* Ae text 
text matching in the presence ^ S™^^1 8 *°^f have been descrf bed [13] that perform 
recognition s^tems * " ' ITOS SUmIar to *°» P^duced by handwriting 

^^S^^tt ^ ° f — » Educed by 

doesnot work well ^ D mc^r IT baodw ^ query) means that this technique 

arcs =5^353=5^ ?»«5= 

nnt inninre chnmcKr nr wonTsn^St if. ered , d »f"f *• search procedure, and does 

b~ffp^i.n woXSKASssr 1 - T " b " i, "'' s for "w - ■* ~~**> 
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1.4 Digital Ink Search Applications ' 

A number of highly desirable applications are made possible by the combination of distal ink 
penastence and digital ink searching, including the ability to search annotations note* comment 
and ofcer handwntten information for keywords or phrases "The digital ink seaSi ^pSeduT^ X 
limited to simply matching the query text, as additional attributes can be usedTo more ac^Ste^v 
specify the desired mfonnation. Examples of these attributes include: date and toe of wST me 
denhty of pen used to produce the writing, geographic location where the writing Zk piac? 

H WlU , Ch ^ k amKktBd < e * elecfr °™ ™* or notebookTSe o? field Zt 
±^eXZ^l (B * 8 tCXt ^ ^ 8 fieW) ' * e l0Cati ° n of * e «-tai or text on 

Pen-based queries also allow searching for infonnation other than handwriting Hand-drawn Dictum 

SSOSL Sf t0 T md dia8rams ta a notebook . 2Tcan t useTtTsS^ 

coUecuon of digital images. As an example, a hand-drawn picture query could be Jto JSSh ™ 

^&£zszr mm ^ libraiy for pk,J * - ^-—£5 r r 

1.5 Detailed Description of the Preferred Embodiments 

In the preferred embodiment, the invention is configured to work with the N*«™.„~ ..... u a 

llZrT, „ ^ Apm 20 ° 3 - " will be appreciated that not every implementation will neaswarilv 
SSn tft^^ T ^ d6tailS ^ extensions described J. ta^ESta 
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2 Domain-Specific Specialization 



S° f C c 81 sea « 5hin 8 algorithms described above are designed to search a specific tvne of 

tf^Lte ESSft 5X8461115 PrOP ° Sed 111 ^ « most when iSg printed 

SsSTSiS 1 Latm " Scn P t text - [12,14,16] describe techniques for searchW™ 

S T krly L SyStemS Can be Sloped that are optimised for searching otherlnedfic 
Sons 8 "* 511(511 35 ^ characters.technical drawing o ffiCS 



^T^\S^T^f^i 10 ^ a 51,60160 form of ** will achieve greater accuracy 

the pattern prmJa^Tused feTlZw .J techniques, pre-processing and normalization, 
, V , pnmiuves used (e.g. stroke, sub-stroke, stroke group, bitmap imaee etc "4 the ^t^^Jt 



2.1 Specialization Examples 



teZSte oTZ^*^ 1 ^ techni< l ues can be developed to exploit the key 
tSSS^hT^St 3? T «■» P 0 ""** discriminatory influence of ascender Sd 

on a page. To hnorove accural „Tr^!!?J I'- f caa be made at arbitrary orientations 

are rendered u3 aT Z S searohul g algorithms may exploit the feet that most drawings 

.7 usm 6 an aggregation of line and shape primitives that mav be useH t n 
image into a canonical form useful for similarity matching. 



ose the 



2 .2 Using Specialized Searching 



this decision^n^ It is also possible for 

between Latin-based and oriental 3*1^71 ? ^ °f metncs ^ 0311 accurately differentiate 
from hand-dmwmgln^ P E ^ ""^ techmqUeS exist to differentiate written text 
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A more flexible system allows individual segments of digital ink to be labelled as a specific digital 
ink type, and subsequently searched using algorithms specialized for that particular type. For 
example the system may allow a user to indicate that they generally write using a specific language 
(e g. in English or Chinese) or writing style (e.g. cursive, printed, upper-case, or mixed) and this 
informanon can be used to select the appropriate ink searching mechanism. In addition to this the 
system may allow the user to manually indicate the type of digital ink being generated. For example, 
the user could use a number of different pens (e.g. one for handwriting text and another for drawing 
pictures) allowing the system to discriminate between the different ink types. Alternatively, gestures 
or other user-initiated actions could be used to label ink data. 

Another approach to specialized digital ink searching is to require the manual selection of the search 
method when the search query is generated. For example, if the user wishes to search for English 
handwritten text, they write their text query, and then indicate to the system that an English 
handwritten text search should be performed using the specified query. Similarly, if the user wishes 
to search for a hand-drawn picture, they draw their query and indicate to the system to perform a 
drawing search Since most digital ink searching systems perform some kind of pre-processing or 
indexing at the time of ink generation (rather than when the query is generated) to ensure a fast 
response to ink search queries, delaying the search strategy decision until the point at which the 
search is initiated means that either 

• die ink data must be pre-processed multiple times and stored in multiple formats (i.e. once 
for each search strategy), or 

■ 

• the pre-processing must be delayed until the search is initiated (thus increasing the time it 
takes to generate the search results). 

The in^rovement in the accuracy of the ink search may justify the increased resource utilization 
required by this technique. 

3 Specialization Using Context Information 

In addition to the techniques described above, the application of specialized digital ink searching 
techniques can be determined from the context (i.e. the contents of the page or document on which 
die ink was written) of the digital ink. Interpreting the information contained in the layout and 
definition of a document can guide die selection of the ink search strategy. 

3.1 Language/Script Identification 

It is reasonable to assume that annotations and comments made on a printed document will usually be 
written m die same language as the text contained in the document itself Thus, if the natural 
language of a document (Le. the language that the text in the document was written in) can be 
determined, specialized ink search strategies can be used to search digital ink annotations contained 
on mat document. 



Many document formats allow the explicit definition of the natural language of the document For 
example, m HTML/XHTML [6,7] the "lang" attribute can be used: 



<HTML lang=»en» dir=« rtl" »</HTML> 



where the language* identified by a two-letter code (e.g. "en" for English, "es" for Spanish, etc.) as 
PIT ™ l T± ex* 131 ? 1 * also shows the ability to specify the text direction ("dir") as right-to- 
left ( rtl ) or left-to-nght ("Itr"). another assumed characteristic of the digital ink that can beused 
when Performing digital ink searching. Similarly, in the XML/XFORMS document specification "a 
special attribute named "xmhlang" may be inserted in documents to specify the language used in the 
contents and attribute values of any element in an XML document" [4]: 



<TTTLE xml : lang= » f r " >XForma en XHTML</TITLE> 
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The Adobe Portable Document Format (PDF) defines the "Lang" attribute, a "language identifier 
specifymg the natural language for all text" [5]. The identifier c«m be used L the decent caS 
(thus specifying the language of the entire document), in any structured element, or in marked- 
content sequences: w 

/Span « /Lang (fr) 
BDC 

(Bonjour.) Tj 

EMC 

Documents may also use the Dublin Core metadata element set, "a standard for cross-domain 
f°TnT, re r Urce ^f 0 ? t28] that identifies the language associated with a i Z 
r language codes [2,3]. Dublin Core metadata conforms to the World WMe Web 

S^fgSg " DeSCrfpti011 R " aWW * "* CaD be m * «*» HTML »d ^ 

If a document format does not allow the specification of the document language or the laneuaee 
specification attnbute is missing, the language of the document may *Z&Z£?S£ 
techmques For example, the use of a particular font will often imply that foe do™,^uth 0 red 
m a particular language or script In some document formats (such as PDF [51) font Se^conmfoa 
anguage attnbute that indicates the natural language of the font In addition to £ fo£e 5£ 
techmques that allow the language of a document to be accurately determined uSng dicSnS 

S£ S^T* ^-^f Searchin « techni{ l ues « 0P«nnsed for a specific script (e g 

^SirE teiS »,° nental <*■»?■». Arabic characters, etc.) that includes a group oftogWgef' 
rafoerfoan being language specific. Obviously, any technique that exploits language iLnfilS for 
pecializadon can also be used for language script based specialization, smcetonSof of foe 
language script is usually trivial once foe language is known laenuncauon oi the 



Field Labels 



appSuoi^r 1 fiXntn^ to either ■** a ^oard for screen-based 

applications or handwritten for pen computing or paper documents, must give foe user some 

2^5S" T ° f A Mortaatioa ^ » required. This* usually done b^SLmng eTchlla mTt 
area, (or field) with a descriptive identifier, for example, "First Name". ^Last Name" "AddiSs" 
Thone Number", and so on. For printed forms, this information appears as printedS on SepSr* 

The information contained in the field labels described above can be used to determine foe digital ,„V 

each field label with foe appropnate data entry region by analysing the form description to associate 
ESS?? ft IT re ^ 0nS -. ° DCe ea0h kbel 13 related with an entry fidtateSTo £52 
£S™ rf- ■t C L Strm f * r njbed (P ° SsibI y regular expression matching) Td foe 

n^tTastStld™ S6arch — ' * ^ ^ foU °^ - — 



Ink Type 
Text 



Drawing 



Field Label 

First Name, Given Name, Surname, Family Name, Address, Suburb, Town, 
State, Country, Region, Email Address 



Phone number, Age, Number, Size, Count, Zip Code, Post Code, Date, Time, 
Credit Card Number, Customer Number 



Picture, Drawing, Image, Diagram 
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3.3 Field Attributes 



In addition to the field type, form definitions often contain information regarding the type of data that 
should be entered in each field. This information is usually contained in attributes that are associated 
with a specific field. For example, some input field types have a flag indicating that the value entered 
must be numeric. A digital ink searching system can use this information to select a numeric search 
strategy for ink contained in the associated data input area. 

In addition to using standard form field attributes to improve the accuracy of digital ink searching, 
digital ink search specific information can be added to fields using custom attributes. This 
information is only used if the document is processed using a digital ink searching system; the 
document can still be used normally where required (e.g. printed or displayed in web browser) since 
processing systems generally ignore the unrecognised custom attributes. However, if digital ink 
searching is required, the custom parameters can be used to improve the accuracy of the search 
results. 



4 Conclusion 



A number of techniques to improve the accuracy of digital ink search were discussed, including a 
method of selecting a digital ink searching strategy from a set of specialized strategies based on the 
expected ink data type. Examples of specialized digital ink searching strategies were given, along 
with a number of methods for integrating these strategies into a system. In addition to this, methods 
for selecting digital ink search strategies based on the layout and definition of a document was 
discussed 
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Figure 1. Digital Ink Searching Using Specialization 
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